18 research outputs found

    Building a Test Collection for Significant-Event Detection in Arabic Tweets

    Get PDF
    With the increasing popularity of microblogging services like Twitter, researchers discov- ered a rich medium for tackling real-life problems like event detection. However, event detection in Twitter is often obstructed by the lack of public evaluation mechanisms such as test collections (set of tweets, labels, and queries to measure the eectiveness of an information retrieval system). The problem is more evident when non-English lan- guages, e.g., Arabic, are concerned. With the recent surge of signicant events in the Arab world, news agencies and decision makers rely on Twitters microblogging service to obtain recent information on events. In this thesis, we address the problem of building a test collection of Arabic tweets (named EveTAR) for the task of event detection. To build EveTAR, we rst adopted an adequate denition of an event, which is a signicant occurrence that takes place at a certain time. An occurrence is signicant if there are news articles about it. We collected Arabic tweets using Twitter's streaming API. Then, we identied a set of events from the Arabic data collection using Wikipedias current events portal. Corresponding tweets were extracted by querying the Arabic data collection with a set of manually-constructed queries. To obtain relevance judgments for those tweets, we leveraged CrowdFlower's crowdsourcing platform. Over a period of 4 weeks, we crawled over 590M tweets, from which we identied 66 events that cover 8 dierent categories and gathered more than 134k relevance judgments. Each event contains an average of 779 relevant tweets. Over all events, we got an average Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evalu- ate three state-of-the-art event detection algorithms. The best performing algorithms achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make our test collection available for research, including events description, manually-crafted queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is the rst Arabic test collection built from scratch for the task of event detection. Addi- tionally, we show in our experiments that it supports other tasks like ad-hoc search

    Towards NLP-based Semi-automatic Preparation of Content for Language Learning using LingoSnacks m-Learning Platform

    Get PDF
    Vocabulary growth is an important element for language learning but it requires repeated and varied exposure to the new words and their usage in different context. However preparing suitable learning content for effective language learning remains a challenging and time-consuming task. This paper reports the experience of designing and developing a m-Learning platform (named LingoSnacks) for semi-automatic preparation of content for language learning using Natural Language Processing (NLP) services. LingoSnacks Authoring Tools provide an environment of assisted authoring of learning content and delivering it to the learner in game-like interactive learning activities. Empirical testing results from teachers who used LingoSnacks indicate that the participants were able to ease their lessons preparation tasks. Also the resulting learning packages helped learners in vocabulary acquisition as the number of new vocabulary that they can recognize, recall and retain was significantly higher that participants who just used conventional lessons in a classroom

    PROVOKE : Toxicity trigger detection in conversations from the top 100 subreddits

    Get PDF
    Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments. Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.© 2022 Wuhan University. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)fi=vertaisarvioitu|en=peerReviewed

    The illusion of data validity : Why numbers about people are likely wrong

    Get PDF
    This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong. Addressing the potential causes of this wrongness, we present examples of analyzing people numbers, i.e., numbers derived from digital data by or about people, and discuss the comforting illusion of data validity. We first lay a foundation by highlighting potential inaccuracies in collecting people data, such as selection bias. Then, we discuss inaccuracies in analyzing people data, such as the flaw of averages, followed by a discussion of errors that are made when trying to make sense of people data through techniques such as posterior labeling. Finally, we discuss a root cause of people data often being wrong – the conceptual conundrum of thinking the numbers are counts when they are actually measures. Practical solutions to address this illusion of data validity are proposed. The implications for theories derived from people data are also highlighted, namely that these people theories are generally wrong as they are often derived from people numbers that are wrong.© 2022 Wuhan University. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed

    How Do Users Perceive Deepfake Personas? Investigating the Deepfake User Perception and Its Implications for Human-Computer Interaction

    Get PDF
    Although deepfakes have a negative connotation in human-computer interaction (HCI) due to their risks, they also involve many opportunities, such as communicating user needs in the form of a “living, talking” deepfake persona. To scope and better understand these opportunities, we present a qualitative analysis of 46 participants’ think-aloud transcripts based on interacting with deepfake personas and human personas, representing a potentially beneficial application of deepfakes for HCI. Our qualitative analysis of 92 think-aloud records indicates five central user deepfake themes, including (1) Realism, (2) User Needs, (3) Distracting Properties, (4) Added Value, and (5) Rapport. The results indicate various challenges in deepfake user perception that technology developers need to address before the potential of deepfake applications can be realized for HCI.© 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0806-0/23/09. https://doi.org/10.1145/3605390.3605397. This work is licensed under a Creative Commons Attribution International 4.0 License.fi=vertaisarvioitu|en=peerReviewed

    Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate

    Get PDF
    Internet and social media participation open doors to a plethora of positive opportunities for the general public. However, in addition to these positive aspects, digital technology also provides an effective medium for spreading hateful content in the form of cyberbullying, bigotry, hateful ideologies, and harassment of individuals and groups. This research aims to investigate the growing body of online hate research (OHR) by mapping general research indices, prevalent themes of research, research hotspots, and influential stakeholders such as organizations and contributing regions. For this, we use scientometric techniques and collect research papers from the Web of Science core database published through March 2019. We apply a predefined search strategy to retrieve peer-reviewed OHR and analyze the data using CiteSpace software by identifying influential papers, themes of research, and collaborating institutions. Our results show that higher-income countries contribute most to OHR, with Western countries accounting for most of the publications, funded by North American and European funding agencies. We also observed increased research activity post-2005, starting from more than 50 publications to more than 550 in 2018. This applies to a number of publications as well as citations. The hotbeds of OHR focus on cyberbullying, social media platforms, co-morbid mental disorders, and profiling of aggressors and victims. Moreover, we identified four main clusters of OHR: (1) Cyberbullying, (2) Sexual solicitation and intimate partner violence, (3) Deep learning and automation, and (4) Extremist and online hate groups, which highlight the cross-disciplinary and multifaceted nature of OHR as a field of research. The research has implications for researchers and policymakers engaged in OHR and its associated problems for individuals and society

    Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate

    Get PDF
    Internet and social media participation open doors to a plethora of positive opportunities for the general public. However, in addition to these positive aspects, digital technology also provides an effective medium for spreading hateful content in the form of cyberbullying, bigotry, hateful ideologies, and harassment of individuals and groups. This research aims to investigate the growing body of online hate research (OHR) by mapping general research indices, prevalent themes of research, research hotspots, and influential stakeholders such as organizations and contributing regions. For this, we use scientometric techniques and collect research papers from the Web of Science core database published through March 2019. We apply a predefined search strategy to retrieve peer-reviewed OHR and analyze the data using CiteSpace software by identifying influential papers, themes of research, and collaborating institutions. Our results show that higher-income countries contribute most to OHR, with Western countries accounting for most of the publications, funded by North American and European funding agencies. We also observed increased research activity post-2005, starting from more than 50 publications to more than 550 in 2018. This applies to a number of publications as well as citations. The hotbeds of OHR focus on cyberbullying, social media platforms, co-morbid mental disorders, and profiling of aggressors and victims. Moreover, we identified four main clusters of OHR: (1) Cyberbullying, (2) Sexual solicitation and intimate partner violence, (3) Deep learning and automation, and (4) Extremist and online hate groups, which highlight the cross-disciplinary and multifaceted nature of OHR as a field of research. The research has implications for researchers and policymakers engaged in OHR and its associated problems for individuals and society.</p

    Developing an online hate classifier for multiple social media platforms

    Get PDF
    The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.</p

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    Game-based micro-learning approach for language vocabulary acquisition using LingoSnacks

    No full text
    Acquisition of new vocabulary is an important element for language learning but it requires repeated and varied exposure to the new words and their usage. This paper reports the experience of designing and developing a game-based micro-learning platform (named LingoSnacks) for interactive learning of Arabic vocabulary. The LingoSnacks learning platform provides an environment of authoring learning content and delivering it to the learner in game-like interactive learning activities. Usability evaluation was conducted to refine the system and improve its usability. Empirical testing results for students who used LingoSnacks indicate that the participants were able to increase their rate of vocabulary acquisition as the number of new vocabulary that they can recognize, recall and retain was significantly higher that participants who just used conventional lessons in a classroom
    corecore